Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

نویسندگان

  • Randy Gomez
  • Keisuke Nakamura
  • Kazuhiro Nakadai
چکیده

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method. Keywords—Human Machine Interaction, Human Computer Interaction, Voice Recognition, Acoustic Model Compensation, Acoustic

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast, Robust, Automatic Blink Detector

Introduction “Blink” is defined as closing and opening of the eyes in a small duration of time. In this study, we aimed to introduce a fast, robust, vision-based approach for blink detection. Materials and Methods This approach consists of two steps. In the first step, the subject’s face is localized every second and with the first blink, the system detects the eye’s location and creates an ope...

متن کامل

Robust Adaptive Actuator Failure Compensation of MIMO Systems with Unknown State Delays

In this paper, a robust adaptive actuator failure compensation control scheme is proposed for a class of multi input multi output linear systems with unknown time-varying state delay and in the presence of unknown actuator failures and external disturbance. The adaptive controller structure is designed based on the SPR-Lyapunov approach to achieve the control objective under the specific assump...

متن کامل

Automatic Sperm Analysis in Microscopic Images of Human Semen: Segmentation Using Minimization of Information Distance

Introduction The morphologic features of human sperms are key indicators for monitoring fertility problems in men. Therefore, automated analyzing methods via microscopic videos have become the most favorite policy in infertility treatment during the last decades. Materials and Methods In the proposed method, firstly a hypothesis testing framework was defined to distinguish sperms from backgroun...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013